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CHAPTER  I 


INTRODUCTION 

This  technical  report  is  a digest  of  theoretical  (mostly)  results  on 
short  term  forecasting.  The  ubiquitous  criterion  is  mean  square  error  MSE, 
which  is  the  most  amenable  to  mathematical  analysis,  but  in  some  cases 
transformations  of  process  variables  to  be  forecasted  infer  other  error 
measures.  This  study  is  also  a proselytization  for  the  Kalman  f lifer,  which 
together  with  its  underlying  vector  process  model,  is  a powerful  and  flex- 
ible algorithm  which  encompasses  many  other  techniques.  Most  of  the  results 
are  original,  albeit  many  are  first  order  derivations;  i.e.,  anyone  with  the 
starting  relations  culled  from  various  sources,  could  generate  and  manipulate 
the  subsequent  equations.  A key  point,  however,  is  that  an  extensive  syn- 
thesis has  not  been  made  previously  in  the  literature;  for  example,  many 
statisticians  are  unaware  of  Kalman's  work  and  control  theoreticians  and 
engineers  probably  were  not  cognizant  of  Box-Jenkins  work  (publication  of 
their  classic  book  Improved  matters).  To  an  extent,  the  study  concerns 
systematica,  a scheme  for  classifying  such  well  known  forecasting  algorithms 
as  exponential  smoothing,  moving  averages,  regression  line  fitting,  by  the 
underlying  processes  for  which  these  algorithms  are  optimal  or  sub-optimal. 

The  most  useful  and  most  discussed  process  model  in  this  study  because 
of  simplicity,  flexibility,  and  robustness  is  the  Dynamic  Mean.  (Process 
mean  is  a random  walk  and  observations  are  corrupted  by  noise.)  A first- 
order  Kalman  filter  is  the  optimal  algorithm  and  can  handle  prior  distri- 
butions on  the  process  mean  and  time  varying  variances  of  noise.  After 
infinite  observation  time  on  the  process  ("steady-state"),  the  Kalman 

filter  acts  like  single  exponential  smoothing  if  the  noise  variances  are 

* 

constant.  In  steady  state,  a moving  average  is  a sub-optimal  algorithm 
for  the  Dynamic  Mean.  Other  useful  descriptive  models  of  a process  often 

*In  our  context,  sub-optimal  refers  to  methods  which  track  generated 
data  of  this  model  better  than  other  models  and  which  use  the  process 
psrameters  in  determining  algorithm  parameters,  e.g. , moving  average 
base  period. 
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may  be  changed  to  a Dynamic  Mean  by  a transformation  of  the  observed 
process  variable.  This  transformed  variable.  If  observable,  can  then 
be  forecasted  and  then  an  Inverse  transformation  of  this  forecast  will 
In  some  sense  be  a "best"  forecast  of  the  original  process  variable. 

If  the  process  D of  Interest  Is  dependent  upon  an  ancillary  process, 
a variable  h of  which  Is  observable,  the  Intuitive  notion  of  using  an 
estimate  of  a rate  variable  D/h  for  forecasting  D - If  h can  be  forecasted  - 
Is  shown  In  mathematical  terms  to  Improve  performance  under  some  error 
measure.  Kalman  and  weighted  moving  average  filters  on  the  rate  or  on 
a function  of  rate  are  found  to  be  optimal  and  suboptlmal  respectively. 

If  the  process  D is  dependent  more  Indirectly  on  a secondary  ancillary 
variable  g,  which  may  be  "more"  observable  than  h,  it  may  be  more  effica- 
cious to  operate  on  the  rate  variable  D/g.  Relations  among  parameters 
of  the  D,  D/h,  and  D/g  processes  and  among  parameters  of  the  subsequent 
algorithms  are  found. 

Expressions  have  been  developed  for  the  mean-square  error  when  moving 
averages  are  applied  to  the  observations  from  the  process  models. 

The  emphasis  of  the  mathematical  details  Is  on  scalar  representations 
of  the  Kalman  filter  as  applied  to  Dynamic  Mean  process  models  (no  trend) 
and  Linear  Growth  process  models  (trend)  - with  and  without  single  ancillary 
variables.  Operating  from  various  Initial  conditions  the  Kalman  filter 
can  encompass  algorithms  obtained  from  exact  or  approximate  Bayesian 
methods,  linear  regressions  over  time  or  the  ancillary  variable,  and 
weighted  moving  averages.  In  steady-state  the  Kalman  equations  are 
equivalent  to  Box-Jenkins  algorithms  for  integrated  moving  average 
processes  (0,1,1)  and  0,2,2),  and  to  Weiner-Hopf  filters  in  the  frequency 
domain  (spectral  analysis) . 

Future  work  of  Interest  Includes  demonstrating  that  the  forecasting 
techniques  for  the  Box-Jenklns  general  ARIMA  model  can  be  subsumed  by  the 
general  vector  Kalman  filter.  Also  desirable  are  comparisons  of  performance 
when  forecasting  from  a rate  variable,  say  D/h,  when  the  ancillary  process 
h can  be  predicted  with  limited  accuracy,  versus  forecasting  directly  from 
D.  And  there  is  need  for  detailed  relatione  of  error  measures,  for  various 
transformations  of  the  process,  to  the  MSE  measure. 
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PART  I 

BASIC  VARIABLES,  FOUNDATIONS 


CHAPTER  II 


DYNAMIC  MEAN  PROCESS 

2.1  Equations,  Algorithms,  Relations 

The  following  equations  represent  the  Dynamic  Mean  model. 

y • x + "*?  ') 

7n  n n 


X “X  - + V 

n n-1  n 


E(7  ) - E(v ) - 0 
n n 


Var  (7)  - r 
n n 


(2.1) 


Var  (vn)  - qn 


(2.2) 


E(717J)  - E^Vj)  - 0 


Eft^)  - 0 


i 1 i 


J 


y^  ■ observed  value  of  process  at  time  n 
x^  - mean  of  process  at  time  n 


7n 


additive  random  noise 


v^  - additive  random  change  In  mean  x 


This  model  Is  sufficiently  complex  to  explain  short  term  trends  In 
a time  series;  the  random  walk  process  on  x can  generate  these  trends. 
The  Initial  value  mQ  - E(xq)  determines  the  long  term  process  level* 
Moving  averages  and  single  exponential  smoothing  work  well  on  this 
process. 
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Kalman  Filter  - 1st  Order 


This  algorithm  Is  optimal  (see  below)  for  the  dynamic  mean 

process.  The  "confidence"  placed  on  the  most  recent  observation  y^  is 

expressed  by  the  value  of  the  weight  G , which  Is  a function  of  the  noise 

n 

variances. 

?n«)-xn  (2.3) 

(2.4) 

(2.5) 

where 

- estimate  of  mean  of  process  at  end  of  period  n 

yn(JO  ■ forecast  at  end  of  period  n of  the  process  value 
X periods  later 

G^  - variable  weighting  of  one-step  ahead  error 


A 

X 


x , + G 
n-1  n 


" Vl> 


2 2 
q + r G 

„ 2b. n n 

'n+I  “ 2 , 2 „ 2 

Ti  n n n+1 


Initialization: 

xQ  - A + Go(yo-/,) 


G 

o 


where 


(2.6) 

(2.7) 


y - Initial  observed  value  of  process 
2 

A T - mean  and  variance  of  a prior  distribution  on  x 
» o 

2 

r - variance  associated  with  y 
o 1 o 
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The  Kalman  filter  minimizes  the  mean  square  error  (MSE)  in  one-step 
ahead  forecasts,  i.e., 


E(y  - y_  ,a))2  = V 

n n-i  i,n 


and  the  estimator  x is  actually  E(x  y ) where  y designates  all  past 
n n < ..n  *4  n 

history  on  y thru  period  n.  If  the  distributions  are  Gaussian,  x^  is 
also  the  maximum  likelihood  estimator. 

Also  note 


, , , - , 2 rn  +rn-l)  2 

(x  y ) ■ fn  - -f  — 2 2 ■ r 0 

Vl  +Vi  + r„ 


(2.8) 


V.  - Var(x  , y .)  + Var(x  x ,)  + Var(y  x ) 
1 ,n  n-1  'n-1  n n-1  'n  n' 


(r  2 2 2 

<T  + q + r 
n-1  n n 


(2.9) 


(2.9  is  obtained  from 


E[y  - E(x  . y .)]" 
n n-i  'n-i 


- E[(y  -x  ) + (x-x  ) + (x-  E(x  y ))]' 
n n n n-i  n-i  n-i  ^n-i 


and  may  be  viewed  by  the  usual  breakout  of  MSE  of  a forecast: 


r - variance  of  process 
n 


<T  , “ variance  of  forecast 
n-1 


q^  “ bias  squared 
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tlmal  Exponential  Smoothing  - Steady  State  Process 


Let  the  noise  variances  be  constant, 

rn  " r2  . < - q2  (2.K 

and  let  the  process  and  the  forecast  thereof  be  far  from  the  origin. 

Then  Harrison  [S']  shows  that  the  following  exponential  smoothing  type 

algorithm  minimizes  V,  ■ V, 

l,n  1 

V>  • “«» 

K ‘ Vi  + c-  (V  ' Vi>  ' Vi  + c»'„  <2-11 


G„  • ( /l  + 4k  - 1)  / 2 k 


. 2,2 

k « r /q 


and  V*  - r2  / (1-G„) 


For  a non-op tlmal  G, 


. (2  0 r2  + <12)  / (G  (2-C)) 


c * «',W  - <l-«  vi  - * 


Also  defining  as  the  MSE  of  the  cumulative  L - step  forecast, 


\ ■ E - Vi(,»] 


then  for  optimal  G 


VL  - L [1  + (L-l)  G*,  + 


V1  * , 


(L-1)(2L-1)G^  * 


4 being  defined  by  the  equatlonjand  for  non- op tlmal  G, 

8 


! 


I 


♦ 


> 


vL  - * v2  + c • a v 

where  V - I (L-l)lG(L-l)2  + G2  ] 

All  equaclons  except  (2.19)1  can  be  found  In  Harrison  [S']. 


Nov  note  that  (2.5),  given  (2.10),  In  steady-state  becomes 

G _ - 


2 2 

q + r G„ 


q2  + r2  (G„  +1) 


(2.19) 


(2.20) 


The  solution  of  (2.20)  is  exactly  (2.12). 
Also  note  In  steady  state  (2.9)  becomes 


’1 


CT  + q + r - r G + q + r 


(2.21) 


and  (2.21)  equals  r^/(l-Gt4),  i.e.,  (2.14),  only  when  G^is  given  by  (2.12). 
Equation  (2.21)  and  Its  solution  for  relates  the  Kalman  expressions  for 
variances  of  error  In  steady  state  to  the  Harrison  formulae. 

Moving  Average  Algorithm 

The  algorithm  is  described  by  Equation  (2.22) 


a ...  a 

y (i)  - x 
7n'A/  n 


A 1 

* ■ « ZT 


B j-i  yn'J+1 


(2.22) 


Nov  under  conditions  (2.12)  of  constant  variances,  we  show  In  Appendix  B 
that 


Vx  - q2  [k  + k/B  + 


(2.23) 


Equation  (6.18)  In  [S]  is  sunned  over  L steps  and  squared,  where  only 
terms  lnvolvlig  9 E(Cj^+1)  are  non“**ro* 


i 


VL  - - q2  [ (L-l)Lk  - (‘L~1^-  (2L2-L)] 


(2.24)‘ 


Minimizing  V under  continuity  eeeumptions  on  the  moving  average  (MA) 
base  B, 


dB  " q 


^ . 4B+3 
B2  68 


6k  + 4B2  + 3B  - 2B2  - 3B- 


(2.25) 


From  (2.25) 


1 + 6k 
2 


* 2 

Conversely  the  k for  which  B is  suboptimal  is  (2B  - ■ l)/6. 


(2.26) 


Equations  (2.26)  and  (2.12)  are  important  in  that  if  k is  known  for 
the  dynamic  mean  process,  the  best  moving  average  algorithm  uses  a base 
B*  and  the  best  exponential  smoothing  algorithm  uses  a smoothing  constant 
q^.  The  choice  of  these  parameters  is  not  arbitrary  and  are  certainly 
not  related  by  Brown's  [2_]  expression  G 2/B+l. 

2.2  Quantifications 

The  first  table  compares  the  smoothing  parameters  or  weights  obtained 
using  (2.12)  and  those  obtained  using  Brown's  relation  which  equates 
"average  age"  of  data  under  moving  average  and  exponential  smoothing 
procedures. 


2 An  important  special  case,  for  q2  ■ 0,  k gives  VL  - L r /B  + Lr 
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TABLE  2.1:  COMPARISON  OF  SMOOTHING  PARAMETERS 


B 

2B2-1 
k " 6 

G 

G 

2 

B+l 

0 

1 

1 

.7667 

.8730 

1 

2 

1.1667 

.5916 

.6667 

4 

5.1667 

.3537 

.4 

8 

21.1667 

.1950 

.2222 

12 

47.8333 

.1345 

.1538 

-c 

~o 

0 

0 

Table  2.2  compares  mean  square 

forecast  errors  V. ,V. 

using  optimal 

weights  G 

, V, (opt) ; using  MA  base  B 

, V(MAB);  using  G - 2/B+l,  V(ESB). 

X 

For  comparison  across  different  k's. 

2 2 

r + q was  held  constant  - 48.833, 

l.e.,  q2(l+k)  ■ 48.833  or  q ■ 1 when  k ■ 47.833. 

TABLE  2.2:  COMPARISON 

OF  MEAN  SQUARE  ERRORS 

k 

* Leadtime  V- (opt) 

B L 1 

V(MA12) 

V (ES12) 

V (MA8)  V (ES8) 

47.8333 

12  1 55.267 

56.333 

55.340 

56.999  56.343 

47.8333 

12  4 323.986 

341.332 

332.652 

351.998  371.304 

21.1667 

8 1 57.925 

60.460 

58.273 

59.480  58.044 

21.1667 

8 4 397.026 

438.638 

384.467 

422.973  410.927 

As  k 

increases  - the  process  becoming  more  stationary  - the  V's 

decrease; 

it  is  easier  to  forecast  a 

process  where  the  random  change 

7 

(q  effect)  in  the  mean  is  low.  Across  the  rows  the  results  are  what  one 

might  expect,  with  one  exception;  exponential  smoothing  with  G - 2/12+1  ■ 

.1538  does  better  forecasting  over  4 periods  than  with  GM<-  .1950  obtained 

by  minimizing  MSE  of  1 period  forecasts  for  k ■ 21.1667.  Reason:  even 

_ 2 

though  x obtained  after  period  n from  (2.11)  minimizes  E(y  x ) , 
n n+i  n 


ii 


E(yn+2  “ xn^  E^n+L  ~ Xn^  * does  not  necessarily  minimize,  for 

L > 1, 

E<  ? (y^0  - *J>2 


i-i 


n+* 


because  of  non-zero  cross  product  terms.  G - .1538  is  more  conservative 
in  weighting  recent  observations  and  the  algorithm  thereby  benefits  since 
the  mean  may  shift  back  to  past  values  in  the  upcoming  four  periods. 

Finally  we  note  there  is  not  a great  degradation  from  using  sub- 
optimal  algorithms.  Specifically,  for  L - 1, 


k 

21.1667 

47.8333 

100[V (MAB*)  - V1(opt)]/V1(opt) 

2. 68% 

1.93Z 

The  percentages  Increase  for  decreaalng  k;  for  k ~ 1 the  degradation  in 
performance  is  ~5X. 
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CHAPTER  III 


MODIFICATIONS 

3.1  Linear  Growth  Model 

We  augment  the  dynamic  mean  process  with  a growth  term  /3  and  random 
variation  £ 


x - x . +p  + v (3.1) 

n n-1  n 

P m P i + J” 
rn  rn-l  n 

E(«S  ) - E(f  i ) - 0 Vara  ) - p*  (3.2) 

n n m n n 

2 2 

and  as  before  Var  7 ■ r . Var  v ■ q 

n n*  n ^n 

This  model  allows  for  linear  growth  over  time  of  the  process  mean;  the 
rate  is  more  apparent  if  E (0°) » p . Double  exponential  smoothing  and 
linear  regression  with  time  as  the  Independent  variable  would  do  well. 

Kalman  Filter  - 2nd  Order 

The  optimal  Kalman  filter  for  the  linear  growth  model  is  given  by 
Equations  (3.7)  - (3.17). 


*«<*>  - *n+^n 


x - x„  - + ft  - +G*e 
n n-1  n-1  n n 


P - P . + H • e 
1 n n-1  n n 

*n  * »«  - Vl°>  ■ - (Vrt-l' 


(3.3) 
(3. A) 

(3.5) 

(3.6) 
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V - 8., 
n 11, n 

G ■ s, , 
n 11,  n 

H - s.- 
n 12, n 

8ll,n+l  " 


+ r 


/ v_ 


/ V. 


8ll,n 


- G V +2s..  - 2G  H V 

n n 12, n n n n 


+ *22(n  - "oV„  + ’»  + 4 


(3.7) 

(3.8) 

(3.9) 


(3.10) 


812,n+l 

822,n+l 


2 2 
- H V + p 

n n n 22, n n n n 

2 2 
V ‘ - 

n n 


8.  (.  - G_H_V_  + 8 

i/,n 

822,n  - + P 


(3.11) 

(3.12) 


where 


811, n  B12,n  / xn 

- Cov  1 

812, n  *22, nj  \n 

l o-l)  «-13) 

Initialization: 

6o<jro“'° 

(3.14) 

A 

0omh 

(3.15) 

Go  - TV  + ro> 

(3.16) 

8ll,o  • *22, o "V  • *12,o  “ ° <3,17) 

where  b,  T^,2  “ mean  and  variance  of  a prior  distribution  on  f}Q 
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Optimal  "Double"  Exponential  Smoothing  - Steady  State  Process 
As  in  Chapter  II, 


(3.18) 


and  the  forecasting  procedure  is  in  steady  state.  Then  Harrison  [S' ] 

2 a 2 

gives  the  following  algorithm  to  minimize  n “ “ E( - E(yn  - yn_^ Cl 


x +|/J 
n n 


(3.19) 


A 


+ Vi  + c ' 


(3.20) 


A 


+ h • e 

n 


(3.21) 


where  G,  H are  optimal  weights  found  by  solving 


L 

r - 

(1-G)  Vx 

(3.22) 

2 

q - 

(G2  + GH  - 2H)  Vx 

(3.23) 

2 

p “ 

h2v1 

(3.24) 

In  Appendix  A,  we  show  that  (3.22)  - (3.24)  is  equivalent  to  the  steady 

state  solution  of  (3.7)  - (3.12).  G,H  so  obtained  are  quite  different 

2 2 

from  the  Brown  [x]  algorithms  G » (l-«  ),  H ■ (l-»  where  JT  is  an  arbitrary 
value . 

A heuristic,  sub-optimal  algorithm  was  tested  in  Orr  [A]  by  assuming 
2 2 2 2 

p 2 q /100,  r /q  ■ k.  This  expressed  the  feeling  that  the  deviation  in 
growth  is  no  more  than  102  of  deviation  in  process  level.  Then  from  (3.22), 
(3.24) 

r2/p2  - (1-G)/H2  - 100k  (3.25) 
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from  which 


H2  - (l-G)/100k 


(3.26) 


G was  then  updated  using  (2.5)  under  the  approximation  H v o and  then  an 
estimate  of  H was  /(1-G  )/100k. 


Moving  Average  Algorithm  - given  (3.18)  & p ■ o. 

The  linear  growth  model  is  assumed  to  have  a constant  growth  ft 
Algorithm  is  given  by  (2.22) 

We  show  in  Appendix  B that 


fl*  2 

Vl.o  + - (B+1> 

2 (*2  2 

Vl,o  + L 


(3.27) 


[L2(L-1)2  + 2 (B+l) (L-l) (L) (L+l) ] 


(3.28) 


where 

vi-.-  E<vi  - s 

V-  ‘ E <yn./  " i ^ Vj+l'1 

V1  V-  „ given  by  (2.23)  and  (2.24) 

1)0  L|0 

V , V are  MSB  for  linear  growth  model  with  conatant 

A #r  L » r 

growth  (3 

We  see  lnoediately  how  much  worse  MA  algorithms  do  on  linear  growth 
models  compared  to  their  performance  on  dynamic  mean  models  - specifically 
the  last  terms  in  (3.27)  and  (3.28).  Expressions  for  optimal  base  I are 
quite  complex,  but  one  can  see  that  the  base  period  should  be  shortened 
as  |/3 f .L  Increase. 

Note  also  whan  p • o,  from  (3.22)-(3. 24) , that  H - o and  V. (opt)  - 

2 

using  (3.19).  (3.20)  algorithm.  This  value  is  lower  than  that  for 


(3.29) 

3.30) 


a MA  algorithm  for  any  ft,  aa  expected t since  (3.20)  is  the  optimal  steady 
state  algorithm. 

3.2  Dynamic  Proportion  Model 

This  model  is  useful  when  expressing  noise  as  a "percentage"  of  the 

process  level  variables. 

z - u . p , p > o 
n n ' n ' n 

(3.31) 

U “ u W , W ■?  0 

n n-1  n n 


E(/°n)  - E(wn)  - 1 


(3.32) 


where 


z^  - observed  value  of  process  in  period  n 

ur  ■ mean  of  process  in  period  n 

“ multiplicative  noise  random  variable 

wn  - multiplicative  random  change  or  "percentage"  change  in 
in  mean  u from  n-1  to  n. 


Unlike  the  Dynamic  Mean,  here  we  do  not  have  the  possibility  of 
"going  negative"  on  the  process  variables. 

Assume  all  variables  are  distributed  log  normally.  Then 

E(/>  ) 1/2  ^ 0.33) 


E(wn)  - 1/2  q 

2 2 

Var(^n)  - + r (Cr  - 1) 

V.r(«n)  - * "2  <«"  - 1) 


(3.34) 

(3.35) 

(3.36) 


2 2 

where  (/ ),(flv,q  ) are  mean  and  variance  of  normal  variates  log/7, 
log  w respectively. 


17 


Constraint  (3.32)  applied  to  (3.33)  - (3.36)  yields 


0,  - 1/2  r2 

(3.37) 

• 1,2  ’2 

(3.38) 

2 

Var  f>  - er  - 1 

(3.39) 

2 

Var  w ■ - 1 

(3.40) 

Letting  y ■ log  z In  (3.31)  we  obtain 


System  (3.41)  - (3.44)  Is  a special  case  of  the  linear  growth  model 

2 

with  (3  • constant  ■ - 1/2  q ; previous  algorithms  are  then  suitable  for 
forecasting  the  transformed  time  series  y ■ log  s from  which  z ■ ^ n 
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2 2 

If  r4-,  q are  small,  we  can  relax  the  assumption  of  log-normality, 
since  the  initial  relations  approximately  hold.  For  example: 

log  e ~ Cf-D  - 1/2  (C-l)2  for  o < (>  < 2 

if  E(^)  - 1,  E(log  )ft  o - 1/2  r2cr  o 2 

Var(log/'-')  ~ Var(^-l)  ■ Var^o  which  is  satisfied  since  r2£  £r  - 1 
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PART  II 

ANCILLARY  VARIABLES,  TRANSFORMATIONS 
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CHAPTER  IV 


PRIMARY  ANCILLARIES,  COMPARATIVE  ANALYSIS  OF  TRANSFORMATIONS 
In  this  chapter  we  shall  consider  models  of  an  observed  process  D 

n 

(e.g.  demand  for  a repair  part),  where  D is  related  to  an  independent 

n 

variable  h (e.g.  hours  flown  by  aircraft  using  the  part).  Relations 
n 

among  the  parameters  of  the  processes,  D , D /h  , log  D and  log  D / h 

n n n n n n 

will  be  explored. 


4.1  D Versus  D/h 


If  one  assumes  that  the  noise  terms  in  equation  (2.1)  are  proportional 
to  the  process  levels  x (which  la  most  reasonable) , one  obtains  quite 
powerful  and  flexible  versions  of  the  basic  model.  Also  with  the  incorpora- 
tion of  the  ancillary  variables  h into  the  model,  the  performance  of  two 

n 

Dynamic  Mean  algorithms  can  be  compared. 


D 

- x • (1+7  ) 

(a) 

n 

n n 

X 

-ah 

(b) 

n 

n n 

a 

- a .(l+f  ) 

(c) 

n 

n-1  n 

h 

n 

- h (1+0 

n-1  n 

(d) 

where 


D - basic  observable 
n 

xr  “ mean  of  process 
a - fluctuating  rate  of  x /h 


h^  - ancillary  variable 

7 , f f noise  terms  with  E(*)  - o,  Var  (• ) ■ V, 
n n n 


From  (4.1) (b),  (c) , (d) , 


I < 

n-1 


(l  + f + J +5  f) 

n n n n 


(4.1) 


(4.1)e 
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r 

i 


♦ 


i 


Note  (4.1) (a)  and  (4.1)  (a)  la  a form  of  dynamic  mean  model.  A similar 
pair  of  equations  can  be  obtained  for  Che  observable  Dn/hn,  i.e., 

(4.2) 

n n-i  n- 


D /h  - a (1  +"7  ) 
n n n n 


a - a . (1  + € ) 


Nov  we  determine  the  k factors  for  D and  D /h  . First  note  that 

n n n 

Var  (CtSt)  - E Var  (CTSt|CT)  + Var  E (CTSt|CT) 


E (op  vs  + 0 


(4,3) 


where  St  Is  a noise  variable  t , f , f and  CT  is  any  of  the  multiplier 
variables  in  (4.1),  (4.2).  Hence 


S/h 


Var  <Vn>  E<VVt  W_2  E(Vl)(1^) 

" v“  E(.^_1)V£  " v,  ■ .(.^1 


■vT 

V.r  (xn-1n) 


k»  " V.t  (x  • (f  + X + 1()) 

n-i  n n n n ' 


E(xa>  \ 

Vc 

n-1 


EOr^xi  + vv) 
E<Vi} 


Vv 


V~ 

V 


(4.4) 


Vv 


(4.5) 


Typically  Vn  is  quite  small  (since  1 Is  a percentage  change  from  mean) , 
so  only  the  first  terms  of  (4.4)  and  (4.5)  are  significant. 


Vh  . vc  * + Vc 

*D  " Vc 


(4.6) 
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Therefore  k^^  is  larger  than  k^  and  can  be  much  larger  If  Vf  »'<  Vj  . 
Let  us  now  determine  the  Impact  on  the  long  term  mean  square  error  under 
optimal  steady  state  algorithms  as  given  by  equation  (2.14).  For  the 
D process 


* a 2 E(xn)V^ 

Yd  “ E<Dn  - V - r^V 


(4.7) 


For  the  D/h  process 


A.  2 


vi,i>/h  ■ E‘Vh.  - <W>  • TTs 


D/h 


(4.8) 


The  L.H.S.  of  (4.7)  and  (4.8)  differ  by  a factor  E(h^)  (or  E(l/h^), 

n n 

depending  on  the  viewpoint)  for  comparison  of  forecast  errors.  Hence 

it  o * 

for  comparative  purposes,  we  compare  D to  E(hn  >'Vl.D/h 


pE[D/h]  1 - G„ 


FE 


(dT 


1-G 


D/h 


(4.9) 


where 

[P] 

FE  - squared  forecast  error  of  basic  observable  D using 
the  optimal  algorithm  on  process  P 

Since  k^y^  fcjj*  (2.12)  gives  GDyh<  Gp  and  hence  FE^hY  FE^‘ 

-1/2 

In  particular  as  k •*<*>,  G ->  k . If  k^  ■ 16  and  k^y^  » 81,  then 

1-1/4  27  . ..... 

2 _ 2/9  “ 32”  or  *“out  15*  Improvement, 

using  the  D/h  variable  (exact  ratio  Is 

Result  using  actual  demand  data:  From  Orr  [P]  mean  absolute  errors 

averaged  over  a group  of  iteaa  gives  a ratio  which  la  then  squared 


. .808 

10. 942) i 
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For  this  group  of  items  k^  ■ A. 251,  k^  • 14.18  from  which 


1 - G„ 


1 - .38145  .61854 

1 - .23263  " .76737 


This  section  and  succeeding  sectloiuof  Chapter  IV  demonstrate 
mathematically  the  intuitive  notion  that  utilising  the  ancillary  variable, 
if  it  is  in  fact  inbedded  in  the  modal  which  properly  describes  the  pro- 
cess, should  improve  forecasting  performance.  Also  the  mathematical 
groundwork  is  laid  for  Chapter  V,  where  only  a secondary  ancillary  is 
available  for  observation. 

4.2  D vs  log  D 

As  in  section  4.1, 


D - x (1  + T ) 
n n n 


(4.10) 


x ■ x , 
n n-1 


U ♦ vtt) 


from  which,  as  in  (4.5) 

(1+V) 

S v7~ 


(4.11) 


Comparing  (4.10)  with  (3.31) 


1 + 7 ■ /°  , (1  + ) ■ w and  therefore 

n n n n 


\ - Var^  - er  - 1 
2 

Vv  - Var  w » eq  -1, 


(4.12) 


From  (3.39)  and  (3.40)  equation  (4.11)  becomes 


(4.13) 


From  section  (3.2)  we  know 


k. 

log  D 


2,  2 

r /q 


(4.14) 


2 . 2 
r and  q 


are  typically  small,  and 


Itp  ■ ^2  (l+q^using  e S ~ 1 + S2 

q 


(4.15) 


A comparison  of  FE^  vs  FE  ^°8 
optimal  algorithms  using  k^,  k^  D 
measures,  l.e.,  G,  _ Is  mlnlmizlni 

log  D d-f’ 

to  relative  measures  such  as  E(-g- ) 
by  an  Inverse  transformation  of  the 


cannot  be  made  directly  since  the 

are  minimizing  different  error 

2 

E(log  D - log  F)  , which  Is  related 

A 

, where  F Is  the  forecasted  D obtained 
"best"  forecast  of  log  D. 


4.3  log  D versus  log  D/h 

For  purpose  of  this  comparison, 

let' b assume  the  processes  can  be  represented  with  dynamic  proportion 
models 


D -up 
n n'  n 

(a) 

u -ah 
n n n 

(b) 

(4.16) 

a - a .If 
n n-1  n 

(O 

a 

V* 

1 

■ 

(d) 

where  p , 'S  , $ are  noise  variables  with  expected  value  - 1 
n n n 


u 


n 


u -w  where  w 
n-1  n n 


» 1 


n n 


(e) 


Now 

Dn^n  " *nf n 


(4.17) 
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Making  the  log  transformation 


log  « log  + log  /» 

(4.18) 

log  un  • log  unl  + log  + log  Tn 


108  Dn/hn  - 108  ‘n  + lo*  f n 

log  a - log  a , +■  log  t 
n n-1  n 

Letting  U,  - Var  (log(«))one  finds 


klog  D " ; 


V. 


log  D. 


V* 


In  this  case  ve  can  make  comparisons  since 


(4.19) 


(4.20) 

(4.21) 


Vl,log  D " E (log  D ' 1<J8  F)2  " E <l08  (°/F))2  (4.22) 

Vl,log  D/h  " E (log  D/h  " l0»  F/h)2  " E <lo8  <°/F))2  (4.23) 


where  an  Inverse  transformation  Is  made  on  the  optimal  forecast 

log  F/h  to  obtain  the  forecast  F of  D.  Since  k,  > k,  analogously 

* log  D/h  log  D • 

to  analysis  in  section  (4.1),  we  find  VL  1q  Q/h  to  be  smaller,  i.e. 


Vl,log  D/h  * Glog  D 

V t i _ p 

l,log  D log  D/h 


(4.24) 


where  G% 


((1  + 4k.)1/2  - l)/2k. 


(4.25) 
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In  Orr  (>],  It  was  found  that  for  Items  with  higher  activity 

(larger  D) , forecasting  from  log  D transformation  did  better.  Arguing 

from  equation  (2.14),  this  could  be  explained  by  a modification  of  the 

modelling  in  this  section,  where  the  mean  rate  of  log  D/h  is  quite  stable 

for  these  active  Items,  but  that  the  total  inherent  noise  In  the  process 

2 2 

is  then  encompassed  by  a higher  variance  r ; this  would  make  r /1-G 

2 

relatively  larger  than  for  the  log  D process,  where  r could  be  assumed 
smaller  with  the  other  variation  being  explained  by  fluctuations  In  the 
mean  of  log  D. 

4.4  D/h  versus  log  D/h 

For  D/h:  The  assumed  model  Is 


Dh  “ an(l  +\) 

n n n n 


a • a . (1+E  ) 
n n-1  n' 


(4.26) 


For  log  D/h:  The  assumed  model  Is 


D/h  - a p 
n n n'n 


a ■ a .5 

n n-1  n 


(4.27) 


We  have  ahown  for  D/h: 


x1  - Var  (a J’j  ) - E(a^)  V 
n n n n « 


<n  • V*r  <VlV  * '<*»-!>  ’< 


and  for  log  D/h,  from  (3.39),  (3.40) 

A* 

Var  P - e - 1 
' n 


Var  * - a - 1 

n 
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Also  from  (4.12)  Var  , Var!?n  • 


Similarly  to  section  4.2 


klog  D/h 


(4.28 


(4.29) 


A useful  result  is  given  for  use  in  equation  (2.5) 


2 2 
q + r G 
„ nn  n n 

n+1  q2  + r2  G + r2 

n n n n+1 


If  r2  o c 1/h2  and  E(a2)  is  Independent  of  h (both  reasonable)  then 


• -V  cl/bJ 


and  for/>2  small,  A 2 .cl/h2.  Therefore  r2h2  andA2h2  are  constant  and 
n n n n n n n 

equation  (2.5)  can  be  written  in  terms  of  or  klog  D/h  as 


1 + k Gq 

Vl  ' 1 + k ®"  + k . hJ/h^j 


(4.30) 


Therefore  the  weight  Gn  applied  to  currant  observations  when  forecasting 
D/h  or  log  D/h  increases  in  some  relation  to  current  value  squared  of 
the  ancillary  variable  h . 
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CHAPTER  V 

SECONDARY  ANCILLARIES 

5.1  Focus 

In  this  chapter,  we  discuss  the  use  of  secondary  ancillaries  - 
Independent  variables  which  are  leas  directly  related  than  the  primaries 
to  the  basic  observable  being  forecasted.  In  many  cases  one  must  utilize 
observations  of  these  secondary  variables  when  the  primary  data  Is  not 
available.  For  example  D^,  the  basic  variable  being  forecast.  Is  demand 
for  a repair  part;  hQ,  the  primary  variable.  Is  a measure  of  usage  such 
as  total  hours  flown  by  airplanes  using  the  part;  and  the  secondary 
ancillary,  designated  g^,  might  be  the  number  of  these  airplanes  ("density") 
available  for  flight  during  the  period  n.  Usage  data  may  not  be  measured 
or  retained  on  file,  whereas  the  density  Information  Is  readily  available. 

We  have  found.  In  general.  It  Is  desirable  to  apply  optimal  algorithms 
to  processes  which  have  high  k-factors.  If  there  exists  a relation  be- 
tween demand  D and  usage  h,  then  k^h  > k^.  If,  in  turn,  there  is  a 
dependency  of  usage  on  density  g,  In  that  a certain  percentage  of  the 
variance  in  h can  be  explained  by  variance  In  g,  then  variation  In  D is 
due  In  part  to  density  g.  However  the  direct  fluctuation  of  D with  h 
has  been  obscured  and  more  noise  is  present,  and  hence  D/g  is  a less 
stationary  or  stable  "rate"  variable  than  Is  D/h.  We  would  expect  to 
find  these  Inequalities, 

S/h  > S/.  > “d 

We  know  from  Chapter  IV  that  forecasting  from  a process  with  relatively 
higher  k yields  better  performance  in  projecting  the  ultimate  variable  D. 
Future  work  can  be  done  to  compare  the  relative  performances  for  D/h,  D/g, 

D whan: 

1)  g is  observable  over  all  past  periods  and  h is  only  Intermittently 
observable. 
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li)  The  models  of  the  ancillary  processes  g and  h are  such  that  their 
forecasts  for  future  periods  are  limited  in  accuracy  and  hence 

A 

(D/h)  x forecasted  h and 


(D/g)  x forecasted  g 

are  degraded  forecasts  of  the  variable  D. 

5.2  Model  & Comparison  of  K-factora 

Let  us  rewrite  equations  (4.1)  and  (4.2)  and  append  some  similar 
modeling  on  a secondary  variable  g. 


Xn  (1+V 

n n 

(a) 

a h 

(b) 

n n 

(c) 

Vi  <1+r„> 

(d) 

(e) 

(f) 

Vi(i*y 

(g) 

(5.1) 


where  b^  - fluctuating  rate  coefficient  for  usage  per  density. 

All  noise  terms,  7 . . . 0 , have  E(*)  - o,  Var(»)  * V,  The 
equations  below  follow  immediately  from  the  above. 


x *x  - (1  + £ + i + 

i c ) 

(h) 

n n-1  n n 

n n 

Dn/hn  ” 

n n n n 

(1) 

h - h . (1  + 0 +U 

+ v 0 ) 

U> 

n n-1  n n 

n n ' 

v«„  ■ V„  <1+\> 

(k) 

*.b.  ■ VlVl*1  + 'a  * 

» + ( 0 ) 
n n n 

(1) 
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For  reference  we  rewrite  (4.4)  end  (4.5) 


Vh  - Sf  + v>  4 £ 

(5.2) 

VT  .... 

(5.3) 

*d  v(  + vi  + vi  vt  ' 

V,  + Vj  + VjV, 

Similarly, 

“••‘A 

(5.4) 

Vs  + v. + 

vfv#)  v*  +v*  + vev, 

From  (5.2),  (5.3),  (5.4)  approximations. 

Vf  + VjVt 

1/kD  • l/kD/h  + V, 

(5.5) 

v«  + Vi 

l/kD/g  " 1/kD/h  + Vn 

(5.6) 

Equations  (5.5)  and  (5.6)  give 

1/kD  - 1/kD/h  vr  <1+V 

\ + ^ + vwv# 

(5.7) 

1/kD/«  - 1/kD/h  V.  (1«C> 

v* 

R.H.S.  of  (5.7)  wee  obtained  from  (5.1)(d)  and  (j)  whence  J*  0 + v 
+ WS 

We  now  derive  a formula  from  (5.7)  for  obtaining  k^^vhen  and 
S/h  are  known. 

Equation  (5.7)  can  be  rewritten 


S/g^D/h  ~ kD) 

Wh  ' W 


1 + ~ (1  + V.)  * 1 + F 
V6 


(5.8) 
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(5.9) 


L«t  and  solve  (5.8)  for  P 

■P  - (W)  ltD/h 
S/h  + P “d 

Hot.  II  »„■«,  (■«  md  »-14  kj . - kjj.  This  .hows  th«t  If 

density  ^ doss  not  very,  it  Is  usslsss  ss  sn  sld  to  forecsstlng  snd 

°n«  may  Just  as  wsll  operate  on  D as  D /a  . 

n n n 

"ot*  “ \ • 0.  ' ■ - • — ♦*  V,  ■ S/h 

*•»  if  i*  a constant  then  g^  la  aa  good  an  auxiliary  variable 

*•  hn  *nd  D / g forecaata  aa  well  aa  D /h  . 
u n n n n 

Equation  (5.9)  was  utilized  in  conjunction  with  empirical  estimates 
of  Vv,  Ve  to  obtain  the  kD/g  factors  in  Orr  [S]. 

5.3  Estimates  and  Empirical  Results 
From  (5.1) (f) 


V"  (8n-lwn)"  Var  <*n  ' *n-l> 


and  using  equation  (4.3), 

v . g,r  S - 


(5.10) 


An  estimator  for  (5.10)  using  a series  of  N observations  on  g 


A 

V 


1 H-l,  .2 

a N-2  ST  <«!  - «!,!> 


i i 

N 


<«J> 


(5.11) 
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In  Orr  {^]  chere  were  20  doe  series  on  aircraft  density  for  the 

A 

10,000  repair  parts  Investigated,  and  V was  averaged  over  20  estimates 
A w 

to  give  V 

w 

From  (5.1)  (e)  and  (g) 


Var  (b 


V.r  (b  - b > 
n n-1 


(5.12) 

(5.13) 

Again  In  Orr  [f],  a refined  estimate  V was  used. 

w 

Empirical  Results:  Data  used  by  Orr  [9]  gives 

* 2 

= .01526  V,  i .02613 

From  (5.8),  F = .6  and  hence 

1:6.ko/i? . 

Vh + -6  S 


Var  (hn/gn  - ^n-l.^gn-1.) 

N-1 

l/N-2  Z (hi/g1  - h1,1/g1.1) 
1/N  | (ht/gl)2 


S/g  “ + \ 

la  at  most  1.6  times  larger  than  kp  even  though  on  this  data 

could  become  quite  large.  The  coefficient  of  determination  R2  equaled  .72 

in  a regression  of  usage  h versus  density  g on  this  data.  A relation  can 

2 

be  derived  between  R and  Vw,V#  but  is  not  presented  here;  the  value  of 

.72  is  consistent  with  the  estimates  of  V ,Va  . 

v • 
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APPENDIX  A 


EQUIVALENT  ALGORITHMS 

In  this  appendix  we  show  that  seemingly  different  forecast  algorithms 
presented  in  the  literature  are  equivalent  in  general  or  in  special  cases. 
One  would  only  expect  that  if  the  underlying  models  of  the  process  are 
the  same  and/or  generate  the  same  observables  in  a statistical  sense  and 
if  the  minimizing  criterion  (e.g.  mean  square  error  of  one-period  forecast) 
is  the  same,  the  optimal  algorithms  can  be  shown  to  be  identical.  In  the 
following  sections  we  shall  do  some  pairwise  comparisons  of  algorithms 
developed  or  presented  by  Kalman  ["?],  Harrison  [S'],  Box  and  Jenkins  [•  ], 
Jewell  [C]  and  of  algorithms  obtained  by  regression  and  spectral  techniques. 


A. 1 Kalman  Versus  Harrison 


We  have  shown  in  the  main  report.  Section  2.1,  that  the  Kalman 

approaches  the  in  (2.12)  given  by  Harrison  for  the  dynamic  mean  model 
2 2 

with  constant  r ,q  . In  steady-state  operation,  constant  weighting  factor 
G— minimizes  MSE  V^  - <£,*+  q2  + r2  (from  2.21)  where  <£,’  - Var  (x„jall 
past  history) . 

For  the  linear  growth  model  of  Section  (3.1),  we  now  show  the  Kalman 

2nd  order  weights  G , H approach  G,H  which  are  solutions  of  Harrison's 
n n 

equations  (3.22),  (3.23),  (3.24). 

In  equations  (3.7)  - (3.12)  drop  the  subscript  n and  find  steady-state 
relations 


V * 8U  + r 
G - sn/V 
H - s12/V 

0 - - G2V  + 2s12  - 2 GHV  + s22  - H2V  + q2  + p2 


0 - - GHV  + s22  - H2V  + p2 

2 2 

0 - - H V + p 


(Al) 

(A2) 

(A3) 

(A4) 

(A5) 

(A6) 
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From  (A6),  (3.24)  follows.  (A6)  and  (A5)  give  s 22  - GHV  and  (A4) 
now  becomes 

0 - - G2V  + 2s12  - GHV  + q2  (A7) 

(A3)  used  In  (A7)  yields 

0 - - G2V  + 2HV  - GHV  + q2  (A8) 

from  which  (3.23)  follows.  Inserting  s^  » GV  from  (A2)  In  (Al)  easily 
gives  (3.22). 

A. 2 Harrison  Versus  Box- Jenkins 

In  this  section  we  show  that  the  dynamic  mean  and  linear  growth 
processes  can  be  reformulated  as  Integrated  moving  average  (IMA)  processes, 
(0,1,1)  and  (0,2,2)  respectively  (Box-Jenkins  notation).  The  steady-state 
optimal  algorithms  In  sections  2.1  and  3.1  are  equivalent  to  the  forecast 
procedures  In  B-J  [ 1 , pp.  144,  147]. 

Dynamic  Mean  vs  IMA  (0,1,1): 

Equation  2.1  can  be  expressed  by 


y my  + ^ ^ , + V 

7n  7n-l  n n-1  n 


w * y -y  .‘■"’l  - Tt  . + v 

n 7n  7n-l  n n-1  n 

Expressed  as  an  IMA, (A10)  Is 

w *y  - y . " a -8a  , 
n 7n  7n-l  n n-1, 

2 2 

where  a - random  stock  with  E(a)  ■ o,  E(a  ) m<ra 
B-J  forecasting  from  (All)  Is  given  by 

ViHI  ■ Vi  - 9 Vi 


(A9> 

(A10) 


(All) 


A12) 
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Box-Jenkina  shows  that  under  optimal  forecasting,  minimizing 

that  the  residuals  a are  the  one-step  ahead  forecast  errors 

n 

y - y , (1)  ■ a 
7n  7n-l  n 

Combining  (A12) , (A13) 

*<Vi  - V2(1» 

• V2«>  + d-*)  IVl  - V2«»l 

Note  (A14)  is  of  same  form  as  (2.11)  with  l-8-G— 

Now  from  (A10) , All) 

E(w2)  - r2  + r2  + q2  • (l+d2)<f2 
E(wiwi-1)  "*r2  " "&<ra 


Therefore  autocorrelation  of  lag  1, 
2 


P m — 

1 2 2 
A 2r  +qz 


-0 

i+e2 


Solving  (A17)  for  8 in  terms  of  r2,  q2 


*,  ± w{(^) 

8 is  required  to  be  < 1.  We  ignore  8+,  being  > 1. 

■4* -v5 

2rz  v 


8.  - 


!U  r2q2  + q* 
4r2 


2k 


2k  2k 

and  1 - ■ ( /4k+l  - l)/2k 


±1  I Flv IT  . 2k+1  ~ 

v “ 5v  2k 

QED 
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MSE, 

(A13) 

(A14) 

(A15) 

(A16) 

(A17) 


(A18) 


t 


Linear  Growth  vs  IMA  (0,2,2): 

Equation  3.1  can  be  expressed  by 

v • y - y , , + Z9  + v +T 

n 7n  7n-l  *n-l  rn  n r 


Z * w - w , m 7 -7  ~ 7 i+-,l  o + v - v i+l 

n n n-1  *n  n-1  n-1  n-2  n n-1  n 

(A18) 


Expressed  as  an  IMA,  (A19)  is 


\ - !Vl  + V2  * zn  • *»  • *•  •-  ’• e'*- 


'l  n-1  2n-2 


(A20) 


Box-Jenklns  gives  an  Integrated  fora  of  their  forecast  which  Is  structurally 
equivalent  to  (3.19)  - (3.21)  with 


G'  - 1 + 02  and  H*  - 1 - - 82 


(A21) 


We  now  show  that  G',  H'  are  solutions  of  (3.22)-(3.24) . From  (A19) , (A20) , 
E(Z2)  - r2  + 4r2  + r2  + q2  + q2  + p2  - (1+&J+92)*2  (A22) 

E<Z1Z1-1>  - *l(\  - 27^  + Vz  + v1.vi_1  tSf 


• " 2\-2  ♦ 'Ti_3  + vi-l  " vi-2  + ^i-1*1 

- E[(ai  - ~ 82ai-2)(al-l  " ®lai-2  ' *2ai-3)J 


-2r2  - 2r2  - q2  - (-  ^ + fi^)*2 


(A23) 


Similarly 


E(Zl2l-2)  - r 


- 9 <r2 

2 a 
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(A24) 


I 


jL 


From  (A23),  (A24)  and  (A22) , autocorrelations  of  lag  1,2 


. 2 2 
p - ~4r  - q 
1 ,_2  . „_2  .^2 


- e1(i-e2) 


6r‘  + 2q‘  + K 1 + + 6^ 


P2  “ 


-ft. 


6r2  + 2q2  + y>2  1 + 8*  + 


For  optimal  forecasts,  the  MSE  of  1-atep  forecasts 


E<*n  * Vl<1»  * V " 


Therefore  from  (A22) 


6r2  + 2q2  +p2  - (1  + + $*)  v 


Using  (A28)  in  (A25),  (26) 


_4r2  - q2  - - (1  - 92)  V 


r - - 92  V 


(3.22)  follows  from  (A30)  and  (A21). 
Solving  (A29),  (A30)  for  q2 

2 


q - (4  e2  + eL  - 61e2)  V 
Noting  from  (A21)  *®2  • 1 - C*,  8 • 2 - G'  - H’, 

q2  - (4(1-G')  + 2 - G'  - H'  + (1-G') (2-G'-H')) 
- (-  2H'  + G'2  + G'H') 


(A25) 

(A26) 

(A27) 

(A28) 

(A29) 

(A30) 

(A31) 

(A32) 
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which  agrees  with  (3.23).  Finally  from  (A28) , (A30) , (A31) 
f2  - (1  + 9^  ♦ «22)  V - 6r2  - 2q2 

- (1  + 9^  + 9*  + 602  - 2 (4«2  + &2)3V 

f*2  - [i  + $22  + e22  - 2 + b2  - C^)]  V 

- (1  - - ft.)2  V - H'V  (A32) 

«dl> 


Harrlaon  [S'  ] . Cogger  [3],  and  Goodman  [+]  have  found  some  generaliza- 
tions relating  to  higher  ordar  growth  processes 

y - x ^ + y 
n n 'n 


x (i)  ♦,<»«)♦  V «>  , 1-1...IC 

n n-1  n n 


and  the  IMA  (o,k,k)  structure.  The  equations  exist  in  the  above  references 
for  finding  the  relation  amongst  optimal  weighting  factors  G,  H...; 
variances  of  noise  ■*?,  v^;  and  the  IMA  coefficients^.  But  explicit 
useful  formulas  have  not  bean  presented.  Harrison's  general  solutions, 
for  example,  are  not  immediately  translatable  into  formulae  involving 
More  importantly,  one  should  compare  the  power  and  flexibility  of 
the  general  vector  model  and  Its  Kalman  filter  forecasts. 


y - H x + T 
'n  n«n  ~n 


x - A . x , ♦ P V 
„n  n-l^n-1  n ~n 


With  the  general  ARDtA  model 


(A33) 


4>(0)y  - a„ 

n n 


(A34) 
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where  $(•),  &(•)  are  polynomlala  in  the  backward  shift  operator  B.  It 

would  be  interesting  to  show  the  equivalence  of  the  forecasts  in  steady- 

state  operation. 

The  approach  is  as  follows: 

Let  y ■ x + b and  then  from  (A34) 
n n n 

(B)x  ■ 8(B)a  - ♦ (B)b  (A35) 

n n n 


- ft'(B)  vn  (A36) 

where  has  the  same  statistical  properties  (auto-covariance)  as 

RHS  of  (A35) . Now  (A36)  can  be  put  in  the  form  of  (A33)  where  x - 

rp  ~n 

[x  x , ....x  ,]  and  (B)  is  of  dagree  p. 

n n— i n— p— i 

P t 

*.  

y ■ [1  0 0 i..O]  x + b (A37) 

n *n  n 

x - A (<*)  x„  . + ?(9')  v 

«wU  #wll“±  a#  X\ 

In  steady-state,  Kalman  filtering  yields 

x - A x + G (y  - HA  i ) (A38) 

„n  „n-J.  n «<n— i. 

G is  a column  vactor 

y (1)  - HA  x , y (f)  - HA*  i (A39) 

n n ~n 


In  (A38) , if  one-stap  ahead  forecast  is  truly  the  same  as  the  B-J  forecast 
the  last  term  bacosws  G • a from  (A13) . Moreover  Box-Jenklna 

**  XI 

procedure  satisfies 


■ vi«> + <*i  -v  \ 


*,«>  ■ + *. 

where  is  function  of  0 coefficients. 
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(A40) 


From  (A38),  (A39), 


(A40) 


HA4  x - HA1  *'  x . + f,  a 
«n  »n-l  1 n 


L.H.S.  of  (A41),  using  (A38)  becomes 


(A41) 


that 


HA4  (Ax  . + C a ) and  ws  see  that  for  equality  in  (A41) 

**  n 

HA4  - f,  (A42) 


Equation  (A42)  needs  to  be  proved. 

A. 3 Kalman  Versus  Jewell's  Credible  Mean 
Consider  the  process 


»n  ' *o+ V V“'<'  * 


X "X  , ■ X 

n n-1 


(A43) 


The  mean  of  the  process  does  not  change,  l.e.,  k - Consider  a 
prior  distribution  on  x,  <A(xqI  /» ,TV)  with  mean  /hand  variance  t'.  Nov 
Jewell  [4]  defines  a "mean-credible  time  constant" 


E*Var  (y  |g  ) 

v«f  EOT  I*) 


(A44) 


where  process  y depends  on  pari 
In  this  case  then 

E Var  (yjx) 
" Var^  E(y|xi 


tar  8 which  has  a prior  distribution. 


Bx(r2) 


r_ 

Xx 


(A45) 


Another  Interesting  case  of 
constant  VMR.  Then 

®x(CX) 

" Var^x)  - 


(A44)  is  when  Var(y)  * C E(y)  where  C - 


VMR(y) 


«,(*) 

v^TTx) 


VMR(y) 

vmujT) 
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Jewell  [41  shows  the  best  lineer  forecast,  minimizing  MSE,  is 


nA 


^ < I*5a>  <y  -A> 


l 

where  A - 1/^  , y - sample  mean  ■ — 2 y . 

n 1 1 


Now  we  show  that  the  same  forecast  is  obtained  after  n steps 

2 

operation  of  the  Kalman  algorithm  (2.3)-(2.7).  In  (2.5)  for  q » 
constant , 


1+1  G1+l 

Slightly  modifying  (2.6),  (2.7)  for  consistent  notation 


*!“/*+  G1(y1  -/*) 

-2  _2,  2 

^ /r 


1 r2+r2  T2/r2+1 


G +1 
o 


Therefore  Gq  “ A.  We  use  an  Inductive  proof 


'1  1+G 


, VI 

1 + ° 


1+G  yl 


A 1 a "1 

X2  * 1+GX  X1  + 1+GX  y2 

, , G G /1+G 

1 . 1 . o . , o o 

“ 1+G  /(1+G  ) '1+G  ^ 1+G  yl;  1+G  /(1+G  ) ‘ 

o o o o o o 


1+2G 


h + 1+2G  yl  + 1+2G  y2 


- h + 


2G 

o_ 

1+2G 


yl+y2 


-/O 


(A46) 


in  the 
2 

o,  r 


(A47) 


(A48) 


2 


(A48) 
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(A48)  Is  of  Che  form  (A46) . Assume  form  holds  for  ith  step. 

xi  • thg  + < r />  <**»> 

o o ~ 

i i 

where  y ■ £ y 

~ j 

G 

Note  If  - iV(iri^c  • then  fro“  <A47) 

o 

Go/l+(i-l)Go  Gq 

Gi  " l+c/l+(i-l)G  " 1+iG 
o o o 

Now, 


*1+1  -(l  - °i+i>*i + °i+x»(..  ,ro*  <2-4> 


- 1 - 


G/l+iG 
o o 


14C 


U 

/ (1+iG  ) ) k 


M iG  . . 

r . o 1 i 

1+iG  1+iG  T l 


G/l+iG 

. o o . . 

1+G  /1+iG  lyi+l; 
o o 


1 1 *G0  1 i G0 

i+1  ’ + 1+G  / (1+iG  ) I+iG“  T Z + l+(i+l)G  yi+l 

u u o O O 

iG 

■ " -f  7 0 i yi  + II 

l+fi+l)Ge  i Z 


* l+ii-Dc.  (“oj1  + Vw' 


* i+(i+»c_  f'  * i+5i+do  ( l+r  7 1+1  j <A!0) 
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QED 


A. 4 Kalman  Versus  Regression  Analysis.  Least  Square!  & Minimum  Variance 
Eatlmatora 


Aaauma  a procaaa 


D -ah  + V 
n n n n 

(A51) 

a - a . + C 

n n-1  n 

(A52) 

(A51)  la  a single  variable  ragraaalon  aquation.  A vector 
Kalman  algorithm  can  handle  the  multi-variable  case. 

version  of  the 

Dividing  (A51)  by  h^ 

D/h  -y-a+y 
n n 'n  n ' n 

(A53) 

vhara  Var  7 " Var Y /h  . Aaauma  a constant  ratio 

n n n 

„ 2 
v»r  V r 

■ r*  Sk 

Var  £n  a2 
n qn 

(A54) 

Assume  Var  7 * is  not  dependent  on  h (homoacadaadicity) . 

2 2 n ° 
r ^,h  - constant  C 

n+1  n+1 

2 2 

Then  r h - 
n n 

(A55) 

Equation  (2.5)  then  becomes 

1+kG 

(A56) 

Nov  let  k - <*>  , for  which  the  process  is  now 

\ * * *~>n 

(A57) 

with  V.r  7 • C/hf  - v.r  y. 

n n n 

(A58) 

c _ s 

Gn+1  c ^2.  2 

WVi 

(A59) 
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with  and  • 1 

To  minimize  variance  of  y^,  fora  a Lagrangian  function 

aln  ( £w^  Var  + A (1  -Jw^)  (A63) 

w,x 


The  conditlona  for  aaxlmua  are 


2 wA  Var  y±  - \ 


o or 


b 1_ 

2 Var  yA 


r X 1 

and  since  Zv±  - 1,  j - ^yy- 

1/Var  y. 


•r  y« 


Hence 


h^/C 


'*  ?l/v“  >-1  "ZhJ/c 


froa  (A58) 


Note  also  that  (A60)  is  In  the  fora  of  a weighted  moving  average  on  y. 
A. 5 Kalaan  Versus  Spectral  Analysis  (Welner-Hopf) 


The  Ueiner  filter  problem,  solved  In  the  frequency  domain,  and  the 
Kalaan  filter  problem  solved  In  the  time  doaaln  (its  power  is  most  apparent 
for  non-statlonary  probleaa)  must  obtain  identical  results  for  stationary 
systems  with  Infinite  observation  time,  since  MSE  Is  the  minimization 
criterion  In  both.  I can  only  paraphrase  (and  therefore  will  not  do  so) 
the  note  by  R.J.  Leake,  "Duality  Condition  Established  In  the  Frequency 
Doaaln",  IEEE  Transactions  on  Automatic  Control,  1965.  A more  obtuse 
proof  Is  given  In  Sage  [9],  Chapter  9. 
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APPENDIX  B 


MEAN  SQUARE  ERRORS  OF  MOVING  AVERAGE  FORECASTS  ON  THE  DYNAMIC 
MEAN  & LINEAR  GROWTH  MODELS 

B.l  Dynamic  Mean  Case 

y„  - x +V  , Van  - r 
n n n n 


n-1 


+ v , Var  v 


q 


For  simplicity  assume  xq  - o.  This  doesn't  affect  generality.  We 
note  some  relations  chat  will  be  useful. 


v y - E(y*)  - V x + r (Bl) 

n n n 

* *„  - 1 * * vx  • *<,4>  (,2) 

,(V«-k>  • ’ <Vk>  ■ 8 <V»-k>  (B3> 

E(yn)  - E(xq)  - o (B4) 

V (yn  + Vl*  " E(yn  + yn-l)2  " q + 2r  + 4 V Vl  (B5) 


Nov 


(y 


, B W-l 

Ml  ‘ B f »1>*  f Tj  + Vl 


»+l 


- f ( z n j ♦ Z (B+l-JJVj) 
B+l 


( ' ) - 7S+1  " b + B * VJ 

V,  £ E ( " + l * j2 


(B6) 

(B7) 

(B8) 
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Vx  - r + r/«  + , aaffi*1*) 
(B9)  Equation  (2.23)  QED 

To  find  VL  in  Equation  (2.24),  define 


Then  " C0  + (y*  “ yQ) 

■CC^J-  Vj_  - r + T (see  (B9)) 

E(€,2)  - E(C2)  + 2 E[fo(yA-yo)]  + E(yt-yo)2 
Second  term  is  2E[  (yo-xQ)  yx  - (yQ  - xQ)yo] 

* 2 [V  xq  - E (xoyx  ) - (V  xo  + r)  + E(x  y 
■ - 2r  using  (Bl)  - (B5) 

Third  term  is  Vx  + r + Vx  + r - 2 V x 

* O o 

■ J.  q + 2r 
Therefore  E(e2)  - E(E2)  + q 
Similarly  for  k < i , k f o 

E ( « k)  - E [ ( eo  + y4  -y0)  ( ] 

- E(€2)  - r - r + E [fy-y^ (yk-yQ) ] 

■ E<f*)  * 2t  + V x.  - 2 V x + V x + r 

u K O O 

■ E(€2)  - r + k » q 
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(B9) 


(BIO) 

(Bll) 

(B12) 

(B13) 

)] 

(B14) 

(B15) 

(B16) 


(B17) 


) 


- E('o>  + • '"o’  - r 


Using  (B16) , (B17) , (B18)  we  can  evaluate 


U18) 


(B19) 


VT.+1  - ^ E v + £ = V 


(B20) 


There  are  L+l  terras  such  that 


E (t*)  +/q  - (L+l)  E(£2)  + 


There  are  2L  terms  in  second  sum  where  { mo  or  k**o 


2LE(*0)  - r) 


(B21) 


(B22) 


There  are  l -L  terms  remaining  in  which  there  are  2(L-k)  pairs  involving 
*,  for  k < X Hence 


’ 2 (L-k)  (E(?‘)  - r + kq) 

k-1  ° 


- (E,2.r)(2L2-^l+2q  - 


- (E£2  - r)  (L2  - L)  + q 


Combining  (B21) , (B22) , (B23) 


VL+1  " (L+1)2  V1  ‘ L(L+1)  r + (2L  * ^ L q 


(B24)  -*>  Eqn  (2.24)  QED. 


(B23) 


(B24) 
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r 


B.2  Linear  Growth  with  Constant  0 


y ■*  + ■>f 

n n ln 


x *x  , + jJ  + V 
n n-1  n 


Expression  (B6)  Is  augmented  by  a term 


B-l 


[ (B+l)/2  - ( — ^ J + |S  )]  for  which  the  expected  square 


. ,B+1  A . 2 _ r 

la 


(B25) 


(B25)  + (B9) 


(3.27)  QED 


To  find  V.  we  proceed  as  In  Section  B.l 
L ,b 


l-l 


E(x4  +'\Jl  - xq)Z  - E(xq  - xq  + ifl  + V,  + .5  v^)' 


- E(xo-xo)2  + r +jtq  + jV  + 2 Jf  ft  E(xa  - 


(B26) 


First  3 terms  can  be  expressed  using  (B16)  and  (B25)  as 


B-l 


E(f*i«o  + **  1 *nd  E(xo"*o)  " B/J_  % J 


E(f^)  - E(tf)  + + 2/lH*/'-  /1(B-1)(B)/2B) 


" _0  ♦ + /*2  * «+B+l) 


(B27) 
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I 


4 


/-I 


E<V0>  * E«*0  ‘ Xo  +*<>+  ■>'*+  l Vj)(xo  - *o  + V1 
“ E(W„.o  + l0  + ^’E(xo  ' Xo) 


, 2 , B+l 
■'  A 2 


(B28) 


k-1 


E(fty  - E(xo  - xo  + *£  + 1(  + ~ Vj)(xo  - xo  + k/5+\  + Z Vj) 


E(f^k^-o+  ^+^+(Wf 


(B29) 


(B29)  is  found  by  sxpanslon  analogously  Co  (B28) 


For  a leadtime  L+l,  ve  hava  additional  Carms  In  (B19) . 
From  (B27) 


2 L 

1)  (t  r Jf  (( +B+1)  “ 
o 


(L+l&fcW  + (B+1)  LiktU  ! (,  2 

6 2 J 


From  (B28) 


ID  ,2  r L ia-  - /(«!)  tip- 

o 


From  (B29)  L 

in)  «2  2 :r  2‘  ak  + jtk  ^) 

/ -1  k-1 


■ if  r /P- 


• |i2  1“  u3  - l2  ♦ (»*1)  3,V  2 ] 
1 


,2  L2a+1)2  La+1)(2L+L)  , fn,n  (L+L)L(2L+1)  - L(L+11 

Pa”  f,  \nrri/  A 


iv)  (L+l)2 
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APPENDIX  C 


STATISTICAL  PROCEDURES  FOR  OBTAINING  k-FACTORS 

As  evidenced  from  equations  (2.12),  (2.26),  (3.26)  and  (2.5)  for 

2 2 2 2 

r^,  constant;  the  process  parameter  k ■ r /q  is  quite  important  as 

a parameter  of  the  forecasting  algorithms.  One  would  like  to  know  the 
"k-factor"  for  each  individual  time  series  (item  ) being  generated  by 
the  process  models  discussed  In  Chapters  III  and  IV  and  forecast  future 
periods'  observables  accordingly.  It  is  usually  not  feasible  or  practical 
to  obtain  k-factors  for  each  Individual  item;  infeasible  because  there  is 
often  not  enough  data  to  use  a portion  for  obtaining  k-estimates  and 
the  remainder  for  forecast  testing;  Impractical  because  for  an  inventory 
of  many  items  one  may  be  only  able  to  retain,  in  the  forecast  system, 
parameters  for  classes  of  items. 

During  computer  runs  using  MA  algorithms  with  base  B,  squared  errors 
can  be  averaged  over  the  time  horizon  by  each  item  i to  provide  statistical 
estimates  V^B).  VL1(B).  Equations  (2.23),  (2.24)  or  (3.27),  (3.28)  can 
be  manipulated  to  obtain  estimates  of  k.  Items  are  stratified  by  some 
classification  scheme  into  cells  and  several  methods  for  using  the  cell 
averages  of  V^,  V^,  k are  posed. 

All  the  methods  below  except  a.  assume  a dynamic  mean  model,  i.e., 
(2.23)  and  (2.24)  are  used  to  find  k. 

( Vtl(B)  \ 

a.  Determine  k±  " f ( y — or  ki  " 8 *Vli*B*’  Vli*B 

B,  B',  L in  the  aquations  being  known. 

A 

Than  average  k^  over  items  in  cell  -*  k 

H As 

b.  Let  V^(B)  - average  over  Items  in  cell  of  V^(B) 

Then  k - g (V^B),  V^B') 

_ 

We  discuss  the  problem  in  the  context  of  demand  over  time  for  an  item. 
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» VL1<B> 

c.  Let  R - average  ratio  over  Items  in  cell  of  ^ '' (g)~ 


Then  k - f(R  ) 


- ( V‘> 

d-  k ' f { 


e.  This  used  with  log  transformation,  assuming  a dynamic 
2 2 2 

proportion  model,  where  p ■ (-1/2  q ) in  equations  (3.27),  (3.28). 

a ^2 

For  each  cell,  solve  for  k,  q 

«*.  _ 

V^(B)  ■ function  of  (k,  q ) eqn.  (3.27) 

V^(B)  » function  of  (k,  q^)  eqn.  (3.28) 

Discussion: 

The  methods  above  were  investigated  in  Orr  [?]  for  the  particular 
problems  of  obtaining  k-f actors  by  item  class  for  four  time  series: 

Demand  D,  Demand  per  flying  hour  D/H,  log  D,  and  log  D/H.  The  criterion 
for  selection  of  a method  and  for  selection  of  an  item  stratification 
scheme  was  evidence  of  a pattern  of  k values  over  item  classes. 

A A 

a.  This  method  was  not  investigated.  Variability  in  V^, 

for  individual  items  is  large;  k^  would  be  suspect.  Also  a few  very 

large  k 's  would  dominate  the  cell  average. 

1 * 

b.  This  method  was  tested  in  a simulation  against  method 

A 

c.  and  fared  worse;  e sample  variance  of  k was  larger  and  sample  mean  of 

A 

k was  further  from  a true  k. 

A 

c.  This  procedure  was  adopted  for  obtaining  k by  cells  for  the 
D and  D/H  series.  Reasonable  k-patterns  ware  obtained  for  several 
stratification  schemes  using  real  data  for  ~ 10,000  items. 

A dynamic  mean  modal  with  a given  k generated  observations  moving 
averages  with  given  bases  vara  applied  for  forecasting  and  statistical 
estimates  of  MSE  were  used  in  equations  to  find  £.  Monte  Carlo  replica- 
tions gava  statistics  on  the  mean  and  variance  of  £. 
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d.  This  procedure  wee  tried  using  a stratification  scheme 
which  had  done  well  In  method  c.  It  fared  badly,  yielding  negative  k's 
and  no  pattern. 

A A 2 

e.  This  procedure  was  used  In  obtaining  k,  q by  cells  for 
log  D and  log  D/H  series.  Patterns  were  obtained  for  same  strata 
(with  method  c.)  used  on  D and  D/H  sarles. 
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